The purpose of this notebook is to analyze participant-level data collected for study SGC-3: The Insight Hypothesis
TODO QUESTIONS: 1. The main accuracy score is very much not a normal distribution. What is the most appropriate distribution to use to model these data? 2. Is the linear regression a good enough fit?
#IMPORT DATA from fall and spring files
fall_participants <- "data/fall17_sgc3a_participants.csv"
spring_partcipants <- "data/spring18_sgc3a_participants.csv"
online_participants <- "data/fall21_sgc3a_participants.csv"
df_fall <- read.csv(fall_participants)
df_spring <- read.csv(spring_partcipants)
df_online <- read.csv(online_participants)
#indicate study modality
df_fall$mode <- "lab"
df_spring$mode <- "lab"
df_online$mode <- "online"
#Create combined data frame
df_subjects <- rbind(df_fall, df_spring, df_online) #, df_replication)
df_subjects$tri_min <- df_subjects$triangular_time / 1000 / 60
df_subjects$test_min <- df_subjects$tt_t / 1000 / 60
df_subjects$learn_min <- df_subjects$ts_t / 1000 / 60
#Create factors
df_subjects <- df_subjects %>% mutate(
subject = as.factor(subject),
session = as.factor(session),
term = as.factor(term),
condition = as.factor(condition),
explicit = as.factor(explicit),
impasse = as.factor(impasse),
axis = as.factor(axis)
)
df_fall <- df_subjects %>% filter(term =="fall17")
df_spring <- df_subjects %>% filter(term =="spring18")
df_online <- df_subjects %>% filter(term =="fall21")
df_lab <- df_subjects %>% filter(term != "fall21")
Data was initially collected (in person, SONA groups in computer lab) in Fall 2017. In Spring 2018, additional data were collected after small modifications were made to the experimental platform to increase the size of multiple-choice input buttons, and to add an additional free-response question following the main task block. In Fall 2021, the study was replicated using asynchronous, online SONA pool.
#MANUALLY INSPECT TERMS
df_subjects %>% group_by(term) %>%
dplyr::summarize(n=n())
## # A tibble: 3 × 2
## term n
## <fct> <int>
## 1 fall17 54
## 2 fall21 140
## 3 spring18 72
SGC3-A is a 2-condition (111-control VS 121-impasse) between-subjects experiment with 266 participants.
#MANUALLY INSPECT conditions
df_subjects %>% group_by(term,condition) %>%
summarize(n=n())
## # A tibble: 6 × 3
## # Groups: term [3]
## term condition n
## <fct> <fct> <int>
## 1 fall17 111 27
## 2 fall17 121 27
## 3 fall21 111 68
## 4 fall21 121 72
## 5 spring18 111 35
## 6 spring18 121 37
#Describe participants
subject.stats <- rbind(
"lab"= df_lab %>% select(age) %>% unlist() %>% favstats(),
"online" = df_online %>% select(age) %>% unlist() %>% favstats()
)
subject.stats$female <- c(
(df_lab %>% filter(sex=="Female") %>% count())$n,
(df_online %>% filter(sex=="Female") %>% count())$n
)
For in-person collection, 126 participants (60 % female ) undergraduate STEM majors at a public American University participated in person in exchange for course credit (age: 18 - 33 years). Participants were randomly assigned to one of two experimental groups.
For online replication 140 participants (70 % female ) undergraduate STEM majors at a public American University participated online, asynchronously in exchange for course credit (age: 18 - 31 years). Participants were randomly assigned to one of two experimental groups.
Response accuracy refers to how many questions the subject answers with a correct (triangular) interpretation.
#DESCRIBE distribution of triangular-correct scores
score.stats <- rbind(
"lab"= favstats(df_lab$triangular_score),
"online"= favstats(df_online$triangular_score)
)
score.stats
## min Q1 median Q3 max mean sd n missing
## lab 1 2 3 11 15 5.809524 4.893611 126 0
## online 1 2 2 8 15 5.007143 4.738479 140 0
For in person collection, accuracy scores (n = 126) range from 1 to 15 with a mean score of (M = 5.81, SD = 4.89).
For online replication, (online) accuracy scores (n = 140) range from 1 to 15 with a mean score of (M = 5.01, SD = 4.74).
#VISUALIZE distribution of response accuracy
plab <- gf_histogram(~triangular_score, data = df_lab) %>%
gf_vline(xintercept = score.stats["lab",]$mean, color = "black") +
labs(title="Lab")
ponline <- gf_histogram(~triangular_score, data = df_online) %>%
gf_vline(xintercept = score.stats["online",]$mean, color = "black") +
labs(title="Online")
plot <-ggarrange(plab, ponline, common.legend = TRUE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Score Accuracy by Study",
color = "black", face = "bold", size = 14))
However, QQPlots reveal that this variable does not approximate a normal distribution.
qqPlot(df_lab$triangular_score)
## [1] 7 10
qqPlot(df_online$triangular_score)
## [1] 8 11
TODO: These distributions are clearly not normal. Is a linear model appropriate here, or should we be using a Poisson distribution? See the due dilligence.
TODO: Investigate super high and super low response times.. TODO: Investigate appropriate models for response time data. (see: https://lindeloev.github.io/shiny-rt/) .
#DESCRIBE distribution of response time
time.stats <- rbind(
"lab"= favstats(df_lab$tri_min),
"online"= favstats(df_online$tri_min)
)
time.stats <- time.stats %>% select(-missing) #don't need missing column
time.stats
## min Q1 median Q3 max mean sd n
## lab 3.770800 7.283617 8.804333 10.71089 19.98955 9.253020 2.902315 126
## online 1.983417 7.083704 8.811267 11.84312 27.87285 9.924459 4.527250 140
For in person response latency (for stimulus blocks) (n = 126) range from 3.77 to 19.99 with a mean score of (M = 9.25, SD = 2.9).
For online replication (online) accuracy scores (n = 140) range from 1.98 to 27.87 with a mean score of (M = 9.92, SD = 4.53).
#VISUALIZE distribution of response time
plab <- gf_dhistogram(~tri_min, data = df_lab) %>%
gf_vline(xintercept = time.stats["lab",]$mean, color = "black") %>%
gf_fitdistr(color="red")+
labs(title="Lab")
ponline <- gf_dhistogram(~tri_min, data = df_subjects) %>%
gf_vline(xintercept = time.stats["online",]$mean, color = "black") %>%
gf_fitdistr(color="red")+
labs(title="Online")
plot <-ggarrange(plab, ponline, common.legend = TRUE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Total Time by Study",
color = "black", face = "bold", size = 14))
qqPlot(df_lab$tri_min)
## [1] 67 41
qqPlot(df_online$tri_min)
## [1] 107 120
However, the distribution is clearly not normal, so it is appropriate to transform the response latency variable.
#Apply a log transform
df_lab$log_time <- log(df_lab$tri_min)
df_online$log_time <- log(df_online$tri_min)
#Calculate log transform stats
log_time.stats <- rbind(
"lab" = favstats(~log_time, data = df_lab),
"online" = favstats(~log_time, data = df_online))
log_time.stats
#Visualize log transformed data
plab <- gf_dhistogram(~log_time, data = df_lab) %>%
gf_vline(xintercept = ~log_time.stats['lab',]$mean, color = "black") %>%
gf_fitdistr(color="blue") %>%
gf_labs(title ="response latency (LOGT) (all questions)", x="Log-transform (minutes)")
ponline <- gf_dhistogram(~log_time, data = df_online) %>%
gf_vline(xintercept = ~log_time.stats['lab',]$mean, color = "black") %>%
gf_fitdistr(color="blue") %>%
gf_labs(title ="response latency (LOGT) (all questions)", x="Log-transform (minutes)")
plot <-ggarrange(plab, ponline, common.legend = TRUE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Log-Transformed Time by Study",
color = "black", face = "bold", size = 14))
#VERIFY normality of resulting data with qqPlot
qqPlot(~log_time, data = df_lab)
qqPlot(~log_time, data = df_online)
TODO: The fit appears to be much better than before. Is this good enough?.
The experimental hypothesis (H1) is that structuring the data to pose an impasse (condition 121) will produce significantly better performance than non-impasse (condition 111). The null hypothesis (H0) is that there will be no difference in performance between conditions.
#DESCRIBE scores by condition
score.cond.stats <- rbind(
"lab" = favstats(triangular_score ~ condition, data = df_lab),
"online" = favstats(triangular_score ~ condition, data = df_online)
)
score.cond.stats
## condition min Q1 median Q3 max mean sd n missing
## lab.1 111 1 2 2.0 3.75 15 4.500000 4.643875 62 0
## lab.2 121 1 2 6.5 12.00 15 7.078125 4.828174 64 0
## online.1 111 1 2 2.0 3.00 15 3.897059 3.996789 68 0
## online.2 121 1 2 3.0 11.25 15 6.055556 5.156396 72 0
For in person study, participants in the impasse group had (on average) higher scores (M = 7.08 SD = 4.83) than those in the non-impasse control group (M = 4.5, SD = 4.64).
For online replication study, participants in the impasse group had (on average) higher scores (M = 6.06 SD = 5.16) than those in the non-impasse control group (M = 3.9, SD = 4).
#VISUALIZE scores by condition
condlables <- c("111"="control", "121"="impasse")
plab <- gf_dhistogram( ~triangular_score, fill= ~condition, data = df_lab) %>%
gf_facet_grid(condition~., labeller=labeller(condition=condlables)) %>%
gf_vline(xintercept = ~mean, data = score.cond.stats[c(1:2),], color = "blue")+
labs(title="In Person")
ponline <- gf_dhistogram( ~triangular_score, fill= ~condition, data = df_online) %>%
gf_facet_grid(condition~., labeller=labeller(condition=condlables)) %>%
gf_vline(xintercept = ~mean, data = score.cond.stats[c(3:4),], color = "blue")+
labs(title="Online")
plot <-ggarrange(plab, ponline, legend = FALSE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Score Accuracy by Condition",
color = "black", face = "bold", size = 14))
#VISUALIZE scores by condition
plab <- gf_boxplot(triangular_score ~ condition, data=df_lab) %>%
gf_jitter(color=~condition, alpha=0.5) +
labs (title = "In Person")
ponline <- gf_boxplot(triangular_score ~ condition, data = df_online) %>%
gf_jitter(color=~condition, alpha=0.5)+
labs(title ="Online")
plot <-ggarrange(plab, ponline, legend = FALSE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Score Accuracy by Condition",
color = "black", face = "bold", size = 14))
#MODEL triangular score by condition with a linear model
m1 = lm(triangular_score ~ condition, data = df_lab)
summary(m1)
##
## Call:
## lm(formula = triangular_score ~ condition, data = df_lab)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.078 -3.078 -2.500 3.922 10.500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5000 0.6018 7.478 1.2e-11 ***
## condition121 2.5781 0.8444 3.053 0.00277 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.738 on 124 degrees of freedom
## Multiple R-squared: 0.06993, Adjusted R-squared: 0.06243
## F-statistic: 9.323 on 1 and 124 DF, p-value: 0.00277
#the first co-efficient represents the model estimate for the first group, the second, the difference for the second group
# partition variance with anova
# anova(m1)
# supernova(m1) #runs but doesn't knit?
#calculate effect size
eta_squared(m1, partial = FALSE)
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 | 95% CI
## -------------------------------
## condition | 0.07 | [0.01, 1.00]
##
## - One-sided CIs: upper bound fixed at (1).
#plot residuals
#produce residual vs. fitted plot
plot(fitted(m1), resid(m1))
abline(0,0) #add a horizontal line at 0
plot(density(resid(m1)))
For in person, a linear model predicting triangular score accuracy by condition explains approximately 7% of the variance in triangular score F(1,124) = 9.32, p < 0.05) TODO: Do these residuals look ok?.
#MODEL triangular score by condition with a linear model
m2 = lm(triangular_score ~ condition, data = df_online)
summary(m2)
##
## Call:
## lm(formula = triangular_score ~ condition, data = df_online)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.056 -3.056 -1.897 2.103 11.103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.8971 0.5614 6.941 1.39e-10 ***
## condition121 2.1585 0.7829 2.757 0.00662 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.63 on 138 degrees of freedom
## Multiple R-squared: 0.05221, Adjusted R-squared: 0.04534
## F-statistic: 7.601 on 1 and 138 DF, p-value: 0.006622
#the first co-efficient represents the model estimate for the first group, the second, the difference for the second group
# partition variance with anova
# anova(m1)
# supernova(m2) #runs but doesn't knit?
#calculate effect size
eta_squared(m2, partial = FALSE)
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 | 95% CI
## -------------------------------
## condition | 0.05 | [0.01, 1.00]
##
## - One-sided CIs: upper bound fixed at (1).
#plot residuals
#produce residual vs. fitted plot
plot(fitted(m2), resid(m2))
abline(0,0) #add a horizontal line at 0
plot(density(resid(m2)))
For online replication, a linear model predicting triangular score accuracy by condition explains approximately 5% of the variance in triangular score F(1,138) = 7.60, p < 0.05) TODO: Do these residuals look ok?.
In SGC3A, participants are provided with 15 questions. For the first 5 questions, they experience a ‘scaffolded’ version the stimuli (control condition = 111, impasse scaffold = 121).
What if we only consider the final 10 test items? Assume students have 5 questions worth of scaffolding to figure out the graph for themselves, and we only assess accuracy for the final ten items? Is our experimental hypothesis supported?
#DESCRIBE scores by condition
test.stats <- favstats(tt_n ~ condition, data = df_subjects)
#VISUALIZE scores by condition
gf_dhistogram( ~tt_n, fill= ~condition, data = df_subjects) %>%
gf_facet_grid(condition~.) %>%
gf_vline(xintercept = ~mean, data = test.stats, color = "blue")
#VISUALIZE scores by condition
gf_boxplot(tt_n ~ condition, data=df_subjects) %>%
gf_jitter(color=~condition, alpha=0.5)
mTest = lm(tt_n ~ condition, data = df_subjects)
summary(mTest)
##
## Call:
## lm(formula = tt_n ~ condition, data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.654 -2.654 -1.277 3.346 6.723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.2769 0.2673 12.258 < 2e-16 ***
## condition121 1.3775 0.3739 3.684 0.000278 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.048 on 264 degrees of freedom
## Multiple R-squared: 0.0489, Adjusted R-squared: 0.0453
## F-statistic: 13.57 on 1 and 264 DF, p-value: 0.0002782
anova(mTest)
## Analysis of Variance Table
##
## Response: tt_n
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 126.12 126.118 13.574 0.0002782 ***
## Residuals 264 2452.79 9.291
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# supernova(mTest)
A linear regression model predicting score (test blocks only) from condition explained 5% of variance, F(1,124) = 7.17, p < 0.05.
#VISUALIZE distribution of response time
condlables <- c("111"="control", "121"="impasse")
gfall <- gf_dhistogram(~tri_min, data = df_fall) %>%
gf_vline(xintercept = time.stats['fall',]$mean, color = "red")+
facet_grid(~condition, labeller = labeller(condition=condlables))+
labs(title="Fall", x="")
gspring <- gf_dhistogram(~tri_min, data = df_spring) %>%
gf_vline(xintercept = time.stats['spring',]$mean, color = "red") +
facet_grid(~condition, labeller = labeller(condition=condlables))+
labs(title="Spring", x="")
gonline <- gf_dhistogram(~tri_min, data = df_online) %>%
gf_vline(xintercept = time.stats['online',]$mean, color = "red") +
facet_grid(~condition, labeller = labeller(condition=condlables))+
labs(title="Online")
plot <- ggarrange(gfall, gspring, gonline, common.legend = TRUE, ncol = 1, nrow =3)
annotate_figure(plot, top = text_grob("Response Time by condition and term",
color = "red", face = "bold", size = 14))
### 2.1 Response Latency vs. Accuracy
Are response time (on the triangular blocks (test and learn)) and response accuracy correlated?
#VISUALIZE relationship between response time and accuracy
gf_jitter(tri_min ~ triangular_score, data = df_subjects) %>%
gf_lm()
#VISUALIZE relationship between response time and accuracy by condition
gf_jitter(tri_min ~ triangular_score, color = ~condition, data = df_subjects) %>%
gf_facet_grid(~condition) %>%
gf_lm()
From visual inspection, it does not appear that score and time and strongly correlated.
#
# #simple linear model predicting triangular score from triangular time
# m1 = lm(triangular_score ~ tri_min, data = df_subjects)
# summary(m1)
# cor.test(df_subjects$triangular_score, df_subjects$tri_min)
#
# #linear model with log-transformed triangular time
# mT = lm(triangular_score ~ log(tri_min), data = df_subjects)
# summary(mT)
# cor.test(df_subjects$triangular_score, df_subjects$log_time)
However, a simple linear regression indicates a significant (though small) negative correlation between score accuracy and response time R = -0.18, t(124) = -2.04, p < 0.05.
Do response times differ by condition?
time.stats <- favstats(tri_min ~ condition, data = df_subjects)
time.stats
## condition min Q1 median Q3 max mean sd n
## 1 111 3.146717 7.658642 9.316917 11.23459 27.87285 9.996164 3.991173 130
## 2 121 1.983417 6.868392 8.363025 10.93544 26.13407 9.233849 3.690071 136
## missing
## 1 0
## 2 0
gf_dhistogram(~tri_min, fill = ~condition, data = df_subjects) %>%
gf_facet_grid(condition~.)
The average total response latency (entire 15 question block) for the control condition was slightly higher (M = 9.54s, SD = 3.09) than the impasse-scaffold condition (M = 8.9, 2.69s)
#simple linear model
m1 <- lm(tri_min ~ condition, data = df_subjects)
summary(m1)
##
## Call:
## lm(formula = tri_min ~ condition, data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.2504 -2.3535 -0.7315 1.6319 17.8767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.9962 0.3368 29.680 <2e-16 ***
## condition121 -0.7623 0.4710 -1.618 0.107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.84 on 264 degrees of freedom
## Multiple R-squared: 0.009824, Adjusted R-squared: 0.006073
## F-statistic: 2.619 on 1 and 264 DF, p-value: 0.1068
#linear model on log-transformed data
mT <- lm(log(tri_min) ~ condition, data = df_subjects)
summary(mT)
##
## Call:
## lm(formula = log(tri_min) ~ condition, data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.46400 -0.21203 -0.00751 0.23148 1.11442
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.23356 0.03335 66.964 <2e-16 ***
## condition121 -0.08474 0.04665 -1.817 0.0704 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3803 on 264 degrees of freedom
## Multiple R-squared: 0.01234, Adjusted R-squared: 0.008603
## F-statistic: 3.3 on 1 and 264 DF, p-value: 0.07043
The difference in average response time is not statistically significant F(1,124) = 1.28 , p > 0.05, with a linear model predicting response time from condition explaining only 1% of variance.
Does adding response time to the model help predict score accuracy?
#simple model predicting triangular score from triangular time
m1 = lm(triangular_score ~ condition, data = df_subjects)
summary(m1)
##
## Call:
## lm(formula = triangular_score ~ condition, data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.537 -3.537 -2.185 3.463 10.815
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1846 0.4107 10.189 < 2e-16 ***
## condition121 2.3521 0.5744 4.095 5.62e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.683 on 264 degrees of freedom
## Multiple R-squared: 0.05972, Adjusted R-squared: 0.05616
## F-statistic: 16.77 on 1 and 264 DF, p-value: 5.618e-05
anova(m1) # supernova(m1)
## Analysis of Variance Table
##
## Response: triangular_score
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 367.7 367.73 16.769 5.618e-05 ***
## Residuals 264 5789.4 21.93
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#model including condition and triangular time
m2 = lm(triangular_score ~ condition + tri_min, data = df_subjects)
summary(m2)
##
## Call:
## lm(formula = triangular_score ~ condition + tri_min, data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.504 -3.253 -2.035 3.514 11.465
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.61541 0.85100 6.599 2.27e-10 ***
## condition121 2.24304 0.57434 3.905 0.00012 ***
## tri_min -0.14313 0.07468 -1.917 0.05635 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.659 on 263 degrees of freedom
## Multiple R-squared: 0.07268, Adjusted R-squared: 0.06563
## F-statistic: 10.31 on 2 and 263 DF, p-value: 4.907e-05
anova(m2) # supernova(m2)
## Analysis of Variance Table
##
## Response: triangular_score
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 367.7 367.73 16.939 5.171e-05 ***
## tri_min 1 79.8 79.76 3.674 0.05635 .
## Residuals 263 5709.6 21.71
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#model including condition and triangular time (log-transformed)
mT = lm(triangular_score ~ condition + log(tri_min), data = df_subjects)
summary(mT)
##
## Call:
## lm(formula = triangular_score ~ condition + log(tri_min), data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.975 -3.258 -2.017 3.583 11.329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.7732 1.7374 3.898 0.000123 ***
## condition121 2.2539 0.5765 3.910 0.000118 ***
## log(tri_min) -1.1589 0.7559 -1.533 0.126442
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.671 on 263 degrees of freedom
## Multiple R-squared: 0.06805, Adjusted R-squared: 0.06097
## F-statistic: 9.603 on 2 and 263 DF, p-value: 9.438e-05
anova(mT) # supernova(m2)
## Analysis of Variance Table
##
## Response: triangular_score
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 367.7 367.73 16.8546 5.39e-05 ***
## log(tri_min) 1 51.3 51.28 2.3505 0.1264
## Residuals 263 5738.1 21.82
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model predicting accuracy with condition and response latency explains 9% variance, while the model with only accuracy explains 7% variance, however, the response latency predictor does not reach significance at the 5% alpha level.
TODO: What is the appropriate inference to draw from this model comparison?.
#DESCRIBE distribution of learning block times scores
learn.time.stats <- favstats(df_subjects$learn_min)
learn.time.stats
## min Q1 median Q3 max mean sd n missing
## 0.6142333 2.353563 3.325692 4.6456 12.96385 3.78128 2.046571 266 0
#VISUALIZE distribution of learning block time
gf_dhistogram(~learn_min, data = df_subjects) %>%
gf_vline(xintercept = mean(df_subjects$learn_min), color = "red") %>%
gf_labs(title = "Distribution of LEARN block response time ")
Total learning block times ranged from 0.61 min to 12.96, with M = 3.78, SD = 2.05.
#DESCRIBE distribution of learning block times scores
test.time.stats <- favstats(df_subjects$test_min)
test.time.stats
## min Q1 median Q3 max mean sd n missing
## 1.369183 4.250808 5.42165 6.454271 18.86497 5.825129 2.5289 266 0
#VISUALIZE distribution of learning block time
gf_dhistogram(~test_min, data = df_subjects) %>%
gf_vline(xintercept = mean(df_subjects$test_min), color = "red") %>%
gf_labs(title = "Distribution of TEST block response time ")
Total testing block times ranged from 1.37 min to 18.86, with M = 5.83, SD = 2.53.
Is time spent on learning (v) testing blocks correlated?
#VISUALIZE test v learning block time
gf_point(test_min ~ learn_min, color = ~condition, data = df_subjects) %>%
gf_facet_grid(~condition) %>%
gf_lm()
m1 <- lm( test_min ~ learn_min , data = df_subjects)
summary(m1)
##
## Call:
## lm(formula = test_min ~ learn_min, data = df_subjects)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5047 -1.3944 -0.3009 0.8506 11.2802
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.90532 0.29798 13.106 < 2e-16 ***
## learn_min 0.50771 0.06933 7.323 2.94e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.31 on 264 degrees of freedom
## Multiple R-squared: 0.1688, Adjusted R-squared: 0.1657
## F-statistic: 53.62 on 1 and 264 DF, p-value: 2.944e-12
anova(m1) #supernova(m1)
## Analysis of Variance Table
##
## Response: test_min
## Df Sum Sq Mean Sq F value Pr(>F)
## learn_min 1 286.11 286.114 53.622 2.944e-12 ***
## Residuals 264 1408.65 5.336
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cor.test(df_subjects$test_min, df_subjects$learn_min)
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 7.3227, df = 264, p-value = 2.944e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3057160 0.5061395
## sample estimates:
## cor
## 0.4108799
A linear model predicting test block time from learning block time reveals that learning block time explains 12% variance in testing block time (F(1,124) = 17.1, p < 0.001), with an 18s increase in testing time for every 1 minute increase in learning time. The two variables have a small, significant correlation r = 0.35, t(124) = 4.13, p < 0.001, 95% CI [0.18, 0.49].
What is the appropriate sample size to detect a moderate-sized effect (f = 0.25) at a 0.05 alpha level?
#K = 2 groups, f = 0.25 is moderate effect size
pwr.anova.test(k=2,f=0.25 ,sig.level=.05, power=.8)
##
## Balanced one-way analysis of variance power calculation
##
## k = 2
## n = 63.76561
## f = 0.25
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
The studies should aim to have at least 60 subjects to detect a moderate sized main effect between two conditions.
Data from the first term only (Fall 2017) were analyzed and presented as a paper at CogSci 2019, in support of the experimental hypothesis (H1: impasse is an effective scaffold). Combined across terms, the prior analyses ALSO support the experimental hypothesis. Do the spring-only data also support the H1 hypothesis?
#simple linear model predicting triangle score from condition for spring data only
mspring = lm(triangular_score ~ condition, data = df_subjects %>% filter(term=='fall21'))
#partition variance
anova(mspring)# supernova(m2)
## Analysis of Variance Table
##
## Response: triangular_score
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 162.94 162.936 7.6013 0.006622 **
## Residuals 138 2958.06 21.435
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(mspring)
## 2.5 % 97.5 %
## (Intercept) 2.7869064 5.007211
## condition121 0.6104631 3.706530
A linear model predicting triangular_score from conditio for only data collected in the Spring term does not reveal a significant difference between conditions. However, the previous power analysis suggests that this test may be underpowered, as it includes data from only 72 subjects (total), instead of the recommended 120 (60 / group). It seems more appropriate to combine data from fall and Spring.
#VISUALIZE scores by condition
gf_dhistogram( ~triangular_score, fill= ~condition, data = df_subjects) %>%
gf_facet_grid(condition~term)
#
# #VISUALIZE scores by condition
gf_boxplot(triangular_score ~ condition, data=df_subjects) %>%
gf_facet_grid(~term) %>%
gf_jitter(color=~condition, alpha=0.5)
By visual inspection, the distribution of accuracy scores across condition appear similiar across terms.
It is therefore reasonable to conclude that data collected in Spring 2018 can be combined with Fall 2017.
The accuracy score variable is not normally distributed. Should we be modelling these data with an alternative distribution (such as Poisson, for count data)?
library(fitdistrplus)
plotdist(df_fall$triangular_score, histo = TRUE, demp = TRUE)
fit_n <- fitdist(df_fall$triangular_score, "norm")
fit_p <- fitdist(df_fall$triangular_score, "pois")
summary(fit_n)
## Fitting of the distribution ' norm ' by maximum likelihood
## Parameters :
## estimate Std. Error
## mean 5.759259 0.6707361
## sd 4.928884 0.4742820
## Loglikelihood: -162.7588 AIC: 329.5175 BIC: 333.4955
## Correlation matrix:
## mean sd
## mean 1.000000e+00 -2.260365e-09
## sd -2.260365e-09 1.000000e+00
summary(fit_p)
## Fitting of the distribution ' pois ' by maximum likelihood
## Parameters :
## estimate Std. Error
## lambda 5.759259 0.3265776
## Loglikelihood: -195.1632 AIC: 392.3264 BIC: 394.3154
gofstat(list(fit_n, fit_p), fitnames = c("norm", "pois"))
## Goodness-of-fit statistics
## norm pois
## Kolmogorov-Smirnov statistic 0.286271 0.4263866
## Cramer-von Mises statistic 1.040138 2.5698714
## Anderson-Darling statistic 5.824593 27.2235450
##
## Goodness-of-fit criteria
## norm pois
## Akaike's Information Criterion 329.5175 392.3264
## Bayesian Information Criterion 333.4955 394.3154
par(mfrow=c(2,2))
plot.legend <- c("normal", "poisson")
denscomp(list(fit_n, fit_p), legendtext = plot.legend)
cdfcomp (list(fit_n, fit_p), legendtext = plot.legend)
qqcomp (list(fit_n, fit_p), legendtext = plot.legend)
ppcomp (list(fit_n, fit_p), legendtext = plot.legend)
Try a poisson distribution, and a poisson regression
m2 = glm(triangular_score ~ condition, family = poisson(link = "log"),data = df_subjects)
summary(m2)
##
## Call:
## glm(formula = triangular_score ~ condition, family = poisson(link = "log"),
## data = df_subjects)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.705 -1.549 -1.190 1.256 4.083
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.43141 0.04287 33.386 < 2e-16 ***
## condition121 0.44603 0.05443 8.194 2.53e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1021.5 on 265 degrees of freedom
## Residual deviance: 952.5 on 264 degrees of freedom
## AIC: 1809.5
##
## Number of Fisher Scoring iterations: 5
#plot residuals
plot(fitted(m2), resid(m2))
abline(0,0) #add a horizontal line at 0
plot(density(resid(m2)))
m3 = glm(triangular_score ~ condition, family = quasipoisson(link = "log"),data = df_subjects)
summary(m3)
##
## Call:
## glm(formula = triangular_score ~ condition, family = quasipoisson(link = "log"),
## data = df_subjects)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.705 -1.549 -1.190 1.256 4.083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.43141 0.08719 16.417 < 2e-16 ***
## condition121 0.44603 0.11070 4.029 7.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 4.13573)
##
## Null deviance: 1021.5 on 265 degrees of freedom
## Residual deviance: 952.5 on 264 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
#plot residuals
plot(fitted(m3), resid(m3))
abline(0,0) #add a horizontal line at 0
plot(density(resid(m3)))